Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope: make key more random #38

Merged
merged 1 commit into from
Apr 1, 2024
Merged

Conversation

davidrans
Copy link
Member

@davidrans davidrans commented Apr 1, 2024

I think it is actually possible there's cache collisions happening here. The Scope key was only an 8 character hex value.

If the number of things we're caching under one cache scope is in the hundreds of thousands or millions I think there could be multiple collisions:
https://en.wikipedia.org/wiki/Birthday_problem#Probability_table

qa_req 0 -- Nothing really to QA here.

I think it is actually possible there's cache collisions happening here.
We were generating a cache scope which is only an 8 character hex value.

If the number of things we're caching under one cache scope is in the
hundreds of thousands or millions I think there could be multiple collisions:
https://en.wikipedia.org/wiki/Birthday_problem#Probability_table
@davidrans
Copy link
Member Author

davidrans commented Apr 1, 2024

@sterlinghirsh also suggested adding the time() to the key, as another way of making collisions less likely.

Something like:

time() . substr(md5(microtime() . $this->scopeName), 0, 8);

I'm good with either, I just didn't want to do both and make this prefix unnecessarily long, since memcached does have a max cache key length.

Copy link
Contributor

@jarstelfox jarstelfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR 🌵

@danielbeardsley
Copy link
Member

CR 👍

@danielbeardsley danielbeardsley merged commit e11f9b0 into master Apr 1, 2024
2 checks passed
@sctice-ifixit
Copy link
Member

I think it is actually possible there's cache collisions happening here. The Scope key was only an 8 character hex value.

If this is the case, shouldn't we be able to pretty easily validate that we get the same key for two different wikiids? I'd feel a lot more confident that we've identified the problem if we could demonstrate it.

@davidrans
Copy link
Member Author

I'd feel a lot more confident that we've identified the problem if we could demonstrate it.

Agreed. I tried and could not manage to demonstrate it.

@danielbeardsley
Copy link
Member

Collisions are easy to make if you boil it down (assuming 100k wikis):

image

@danielbeardsley
Copy link
Member

Here's that one-liner:

php -r 'for($x=0; $x<100000; $x++) {echo substr(md5(microtime() . "wiki-scope-$x"), 0, 8) . "\n";}' | sort | uniq -c | sort -nr | head

@sctice-ifixit
Copy link
Member

Collisions are easy to make if you boil it down (assuming 100k wikis):

I already know that if I generate enough random 8-character hexadecimal strings I'll get collisions, but I'm interested in validating the actual application behavior. That is, does the application actually end up generating keys with the same properties as your model? Why are the errors we see biased toward the tr and zh view languages? Why haven't we been able to see an actual collision for the wikis that end up generating the errors we're seeing?

@danielbeardsley
Copy link
Member

That is, does the application actually end up generating keys with the same properties as your model?

I think that part is trivially proven given the code in this class. How often it happens in practice is dependent on how long keys stay in memcache and how many wikis we access.

Why are the errors we see biased toward the tr and zh view languages?

Woah, TIL! That indicates it's caused by something else.

Why haven't we been able to see an actual collision for the wikis that end up generating the errors we're seeing?

I'm not sure what this means. Collisions are fleeting, not permanent (the values are stored in cache).

@sctice-ifixit
Copy link
Member

I'm not sure what this means. Collisions are fleeting, not permanent (the values are stored in cache).

Go to https://zh.ifixit.com/Device/Nikon_D80 (wikiid 7827), and note that you're shown a user profile (wikiid 262147).

php > var_dump(WikiLib::getWikiById(7827, 'zh')->wikiid);
int(262147)

So long as this behavior is reproducible, if it's caused by a cache collision, I expect to be able to get the same key when trying to look up both of those wikis from the cache. So far, I haven't been able to do that.

@davidrans
Copy link
Member Author

I never busted the cache after pushing this fix. I did that a few minutes ago (at the risk of blowing away our reproduction cases) to see if we get any new instances of the error.

@danielbeardsley
Copy link
Member

So far, I haven't been able to do that.

That's a great experiment. David and I played with it a bunch on a live machine and noticed that the cache prefix values were still short because they hadn't been expired since we deployed the change to scoping.

Also, if two scopes (A and B) point to the same prefix value (and thus are writing to the same keys) B can move on, update its prefix and the bug may remain because A still points at the values that B wrote, even though B points somewhere else now.

@sctice-ifixit
Copy link
Member

Also, if two scopes (A and B) point to the same prefix value (and thus are writing to the same keys) B can move on, update its prefix and the bug may remain because A still points at the values that B wrote, even though B points somewhere else now.

Agreed. I thought it would be unlikely that, having checked 3 collisions, the wiki that caused the collision would have a new prefix value in all cases, but maybe I was unlucky, or maybe it's not unlikely at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants